Skip to content

Update RNA-Seq Coverage.ipynb#2

Closed
aleighbrown wants to merge 1 commit intoBioJulia:masterfrom
aleighbrown:master
Closed

Update RNA-Seq Coverage.ipynb#2
aleighbrown wants to merge 1 commit intoBioJulia:masterfrom
aleighbrown:master

Conversation

@aleighbrown
Copy link
Copy Markdown

just changed a period to make it function properly. Just putting it out here to give it a nudge for updates.

just changed a period to make it function
@kescobo
Copy link
Copy Markdown
Member

kescobo commented Jan 3, 2020

Thanks! does this actually run now?

@aleighbrown
Copy link
Copy Markdown
Author

aleighbrown commented Jan 3, 2020

Nope..spoke too soon on that...

intersect isn't working as advertised. To be fair I'm not using the arabidopsis file but a subsampled version of my own data(mouse), and just checking coverage at Gapdh
samtools view /data/24wt_sub.bam chr6:125159852-125168467 | wc -l

6221

but with the as described method of using intersect

chrom = "chr6" leftpos = 125159852 rightpos = 125168467
intersect(, , ) returns an iterator of BAM records that overlaps the specified genomic range. Let's check that the number of returned records matches that of samtools view:

count(_ -> true, intersect(reader, chrom, leftpos:rightpos)) 0

Just in case it was something silly I've also checked chrom = "6", chrom = 6, chrom = "Chr6", chrom = '6'

@kescobo
Copy link
Copy Markdown
Member

kescobo commented Jan 3, 2020

Hmm - does the test arabidopsis data give the expected result? TBH, I don't deal with this type of data myself, so I won't be much help running it down. @CiaranOMara has all the commits on XAM.jl - can they help?

@CiaranOMara
Copy link
Copy Markdown
Member

I'll check whether I'm able to reproduce the results of the Arabidopsis thaliana data, then we may delve in from there.

@CiaranOMara
Copy link
Copy Markdown
Member

The tutorial is certainly out of date. @aleighbrown, I think you'd want to use the eachoverlap function from the BioAlignments v1 package (XAM isn't ready).

I've included my rough workflow below.

# Download Reference genome: Arabidopsis thaliana (assembly TAIR10.1) https://www.ncbi.nlm.nih.gov/genome/?term=txid3702[orgn].
path_genome = download("ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.4_TAIR10.1/GCF_000001735.4_TAIR10.1_genomic.fna.gz", joinpath(@__DIR__, "GCF_000001735.4_TAIR10.1_genomic.fna.gz"))

# Unzip genome.
run(`gunzip $path_genome`)

# Download HISAT2 binary (http://ccb.jhu.edu/software/hisat2/index.shtml).
# path_hisat = download("ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-Linux_x86_64.zip")
path_hisat = download("ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-OSX_x86_64.zip")
# path_hisat = download("http://www.di.fc.ul.pt/~afalcao/hisat2.1/hisat2.1_Windows.zip")

# Unzip HISAT2.
run(`unzip $path_hisat -d $(@__DIR__)`)

# Build index.
run(`hisat2-2.1.0/hisat2-build GCF_000001735.4_TAIR10.1_genomic.fna GCF_000001735.4_TAIR10.1_genomic.fna`)

# Download SRR1238088.
run(`fasterq-dump SRR1238088`)

# Align.
run(`hisat2-2.1.0/hisat2 -x GCF_000001735.4_TAIR10.1_genomic.fna -1 SRR1238088_1.fastq -2 ./SRR1238088_2.fastq -S SRR1238088.sam`)

# Convert SAM to BAM.
run(`samtools view -Sb SRR1238088.sam -o SRR1238088.bam`)

# Sort BAM.
run(`samtools sort SRR1238088.bam -o SRR1238088.sort.bam`)

# Index BAM.
run(`samtools index SRR1238088.sort.bam`)

# The important bit.
using BioAlignments #v1.0.0

# AT3G11820
chrom = "NC_003074.8" # Chr3 (see  https://www.ncbi.nlm.nih.gov/genome/?term=txid3702[orgn])
leftpos = 3729305
rightpos = 3731116

bamfile = "SRR1238088.sort.bam"

open(BAM.Reader, bamfile, index=bamfile * ".bai") do reader
    count(_ -> true, eachoverlap(reader, chrom, leftpos:rightpos))
end

read(pipeline(`samtools view SRR1238088.sort.bam $chrom:$leftpos-$rightpos`, `wc -l`), String) |> strip

In terms of the "Chr6" stuff, you can get the names from the reader.

open(BAM.Reader, bamfile, index=bamfile * ".bai") do reader
    reader.refseqnames
end

"metadata": {},
"outputs": [],
"source": [
"using Bio.Align"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current package is BioAlignments v1.

@kescobo kescobo deleted the branch BioJulia:master June 8, 2022 15:42
@kescobo kescobo closed this Jun 8, 2022
@kescobo
Copy link
Copy Markdown
Member

kescobo commented Jun 8, 2022

Closed because I changed the default branch to main from master, but would be great to get this updated.

Probably now could use XAM.jl, but I would prefer to skip using other deps (like hisat) if possible, but if we can't, could use @CiaranOMara 's idea above

@kescobo kescobo mentioned this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants